Skip to content

Conversation

sliverc
Copy link

@sliverc sliverc commented Sep 9, 2025

Fixes #291

Added transactional safety when writing to redis to avoid corrupted data when multiple processes write to the database. This only works if the complete object is stored. As the redis pipeline itself stores the object quicker as it does it all in one go, I do not expect any performance issues through this change not to have dirty fields.

I would have loved to set up a test but was not successful to reproduce it in a unit test, as it not always occurs. But the tests still run like before, so confident that it does not break more than before.

@cunla Let me know what you think as you have a deeper inside into the project and whether this change makes sense from your point of view.

@sliverc
Copy link
Author

sliverc commented Sep 9, 2025

To note with this change: it only fixes that no corrupt data is written into the redis database. In case if there is already invalid data in the redis database, the exception would stll be raised.

Either this data would need to be removed from redis directly or a PR implemented which ignores invalid models. I am happy to provide a PR in this direction but thought we first make sure this fix looks good and see how to move forward.

if self._parent_key is not None:
connection.sadd(self._parent_key, self.name)
mapping = self.serialize(with_nones=True)
if not self._save_all and len(self._dirty_fields) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this logic?

@cunla
Copy link
Member

cunla commented Sep 14, 2025

This does not really fix the problem, though I do like some of these changes and adopted them.

The cause is the connection needs to be refreshed after fork.

@sliverc
Copy link
Author

sliverc commented Sep 16, 2025

Thanks for looking into this. Very much appreciated. We had a test run of your changes, and it seems the error now moved from JobModel to WorkerModel

web-pms_1           | Failed to deserialize 6880ac62150e-worker.1: WorkerModel.__init__() missing 8 required keyword-only arguments: 'name', 'queue_names', 'pid', 'hostname', 'ip_address', 'version', 'python_version', and 'state'

That's what we get in the scheduler output. I will have a closer look as well, but wanted to let you know. It could very well be that with the connection refresh that this fix works but something is missing when it comes to WorkerModel.

@sliverc
Copy link
Author

sliverc commented Sep 16, 2025

Just saw this that you committed 5096566 maybe that is the fix already for the worker model issue I outlined above?

We used the following commit for a test run 06bf0da

Are you still working on it, or how confident are you with the changes you made? We would love to test run it and see how well it works, but would be good to know when the code is in a state when it makes sense to test it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error with deserialize JobModel
2 participants